perf+chore: FxHashMap on hot scoring tables (-10% Astral, -15% TMT) + post-cutover cleanup by ypriverol · Pull Request #36 · bigbio/msgf-rust

ypriverol · 2026-05-26T18:06:02Z

Summary

Post-cutover code-quality foundation + a profile-driven perf win.

Headline: scoring Param hot tables swap from HashMap (SipHash13) to FxHashMap (rustc-hash). 36.82% of CPU was in HashMap<Partition, ...> lookups; the swap reduces hash overhead 3-5x.

PR-V1 replaces the closed PR #35 (PR-Q1, cleanup-only) — same cleanup commits, plus the measurable perf win.

Bench results (Percolator, controlled VM load 1m=0.70)

Wall time

Dataset	Mode	dev baseline	This PR	Δ
LFQ (PXD001819)	off	0:46	0:47	+1% (noise)
LFQ	auto	1:00	1:01	+1% (noise)
Astral	off	6:12	5:33	-10.7%
Astral	auto	6:54	6:19	-8.5%
TMT	off	2:59	2:31	-15.6%
TMT	auto	3:21	2:51	-15.5%

PSMs @1% FDR

Identical to dev on all 6 cells (FxHashMap doesn't change scoring values; only HashMap iteration order, which .get() lookups don't depend on). Bit-identical regression gate green.

Commits

Commit	Group	What
`a8ad6ddd`	pre-Q1	Remove BUG_REVIEW.md; move CLI_MIGRATION.md to docs/
`55cff3fa`	pre-Q1	Design spec for the cleanup
`84f83295`	cleanup	Scrub 33 dangling `Xxx.java:LINE` references
`ba4c6b34`	cleanup	Neutralize "port of MS-GF+" framing
`f0831b03`	cleanup	Rename `MSGFRUST_RSS_PROBE` -> `MSGF_RSS_PROBE`
`20da1b4e`	cleanup	Fix all clippy lib warnings
`67316e56`	cleanup	Lift clippy CI gate to required (`--all-targets -D warnings`)
`2e1b6c7e`	cleanup	Remove shipped iter39 spec+plan
`ea1f481f`	cleanup	Address review observations (once-guard on deprecation warn)
`28e7a65a`	docs	PR-V1 design spec
`542ab6e2`	S1	`HashMap` -> `FxHashMap` on hot Param tables
`67002e42`	S2 (dropped)	Cal threshold fallback 1e-6 -> 1e-5
`09824bde`	S2 revert	S2 produced only +22 LFQ PSMs vs +50 gate; reverted

Dropped sub-features

S2 (MassCalibrator threshold fallback): produced +22 LFQ PSMs (gate required +50). Mechanism worked but the relaxed-threshold residual median is biased toward 0, halving the learned shift vs Java. Reverted.
S3 (PrecursorErrorPpmSquared PIN column): not implemented. Likely-flat additive column per n=9 audit pattern; not worth the goldens regen.

Why this PR ships

The cleanup commits (PR-Q1) on their own didn't deliver measurable value (closed PR #35). PR-V1 adds the profile-driven S1 perf win that does move the needle: 8-15% wall reduction across Astral/TMT under controlled load, zero PSM regression.

Out of scope (next PR)

I5 (score_psm trace investigation) — root cause of the LFQ/TMT vs Java PSM gap. 2026-05-20 memory note documents a 20-24 point per-PSM scoring divergence (Rust scores Java's favored peptide at 14 vs Java's 38 on SAME spectrum+peptide). This is the real path to +5% per dataset. Brainstormed next as a research PR.
Rustfmt cleanup (~11k lines)

Verification

cargo clippy --workspace --all-targets -- -D warnings clean
Workspace tests green under existing CI skip list
precursor_cal_bit_identical regression gate green
3-dataset bench under controlled VM load (results above)
CodeRabbit review pass
CI matrix (Linux/macOS/Windows) green

- BUG_REVIEW.md was a transient artifact from the post-merge bug-hunt review; the actual fixes shipped in PR #32 and are documented in the PR description / commit history. - CLI_MIGRATION.md is an audience-specific guide (Java MS-GF+ users porting CLI invocations) — belongs under docs/, not at root. - DOCS.md stays at root as the primary single-file reference (per the iter39 docs-rewrite design). - Updated inbound references in README.md and DOCS.md.

Adds the design spec for PR-Q1 (post-cutover code quality sweep) and finalizes the inbound-reference updates from the prior commit (docs/CLI_MIGRATION.md links) that weren't staged at that point. PR-Q1 is the first of three sequential sub-projects (quality -> speed -> ID-rate +5% per dataset). Decomposed during the 2026-05-26 brainstorm because the three concerns differ in risk, scope, and time profile; the ID-rate target is a multi-PR research project, not a single ship gate. Scope: 7 groups (6 in-PR + 1 out-of-repo memory update). Dangling .java:LINE refs (42), stale "port of MS-GF+" framing, identifier renames (MSGFRUST_RSS_PROBE etc.), 26 clippy warnings, lift CI lint to required, remove shipped design specs.

The Java source tree was removed in commit b4565b8 during the Rust-cutover; the inline citations to specific Java line numbers now point at code that does not exist in this repo. Replace each citation with intent-only "Java parity" comments. Preserves semantic meaning; removes the broken hyperlinks. Parity-test files (tests/*_java_parity.rs, tests/gf_bsa_parity.rs, tests/*_match_java.rs) untouched — their identity is Java parity and the citations are load-bearing documentation. 8 non-test files touched, 33 refs replaced, 0 functional changes.

The codebase is post-cutover; new contributors should read crate-lib top-of-file doc comments as descriptions of what each crate does, not as port-bookkeeping. CLI --help and enum doc comments that compared behavior to Java's command-line options now describe behavior directly. KEEPS user-facing provenance: - README.md and DOCS.md project-lineage sections - Legacy numeric flag values (Java MS-GF+ -X) in --help (user migration) - (Java -precursorCal) in precursor_cal doc (exact flag name we mirror) - docs/parity-analysis/** content - Parity test files Touched: 5 crate-lib headers + msgf-rust CLI framing.

The "MSGFRUST_" prefix dates from an early iter-era naming and does not match the binary's identity (msgf-rust). Switch the primary name to MSGF_RSS_PROBE and accept the legacy name for this release with a one-line deprecation warning on stderr. The legacy name will be removed in the next quality cleanup. Side-effect-only env var; no functional change to search/scoring.

Brings the workspace to clippy-clean on stable 1.87.0 so the CI lint job can be lifted from advisory to required in the next commit. Changes by class: - map_or simplifications: mechanical rewrite via clippy --fix - complex-type aliases: SegmentPartitionCache, SegmentPartitionSlice, DeconvResult, and RankKeptCtx struct in crates/scoring/src/scoring/scored_spectrum.rs - too_many_arguments: RankKeptCtx context struct in scored_spectrum.rs; #[allow] with reason for directional_node_score_inner, write_tsv, write_tsv_to, write_spectrum_rows, and compute_cleavage_credit - doc-list indentation: add blank line after list / fix continuation indent at 15 sites in msgf-rust.rs and scored_spectrum.rs - unused_mut, ? rewrite, manual split_once, loop-counter: via clippy --fix - needless_range_loop: suppressed with reason (seg indexes cache AND serves as fallback partition_for arg) No functional behavior change; PIN/TSV bit-identical regression gate in tree (precursor_cal_bit_identical) is the verification.

After PR-Q1 Task 4 left the workspace clippy-clean on --lib targets, remove continue-on-error from the lint job's clippy step and extend the lint command to --all-targets (covers tests + examples + bin in addition to lib). Also addresses 5 residual warnings in test/example targets that the --lib-only fix in Task 4 didn't reach: - crates/scoring/examples/dump_main_ion.rs: struct field shorthand - crates/scoring/examples/dump_prefix_cache.rs: needless_range_loop - crates/scoring/tests/add_prob_dist_chunked_parity.rs: unnecessary parens Rustfmt remains advisory (~11k lines of fmt churn pending; separate cleanup). Lint job now blocks PRs on clippy regressions.

Deletes the iter39 docs-rewrite spec and plan (shipped 2026-05-23 via PR #30; the rewrite is in dev and being relied on, so the design docs no longer need to be discoverable in the repo). Their lineage is in git history. Tracks the in-flight PR-Q1 implementation plan alongside its design spec (committed in 55cff3f). Future protocol: when a docs/specs design file references a feature that has fully shipped and closed any deferred gate, remove it in the next quality cleanup.

Three non-blocking observations from the final code review: 1. DOCS.md §97 documented only the legacy MSGFRUST_RSS_PROBE name. Now mentions MSGF_RSS_PROBE as the canonical with the legacy noted. 2. crates/model/src/amino_acid.rs:13 inline comment referenced the legacy name; updated to MSGF_RSS_PROBE. 3. log_rss deprecation warning fired on every call when only the legacy env var was set. Guard with std::sync::Once so it prints exactly once per process invocation. All non-functional; verification: deprecation warning count is now 1 under MSGFRUST_RSS_PROBE=1 + multiple log_rss checkpoints.

After PR #35 (PR-Q1) closed unmerged for not delivering measurable wins, pivot strategy: stack 3 loosely-coupled sub-features on top of the cleanup commits and ship ONE PR with bench-gated value. Sub-features: - S1: profile-guided Astral wall reduction (gate: -5% wall) - S2: LFQ calibrator threshold fallback 1e-6 -> 1e-5 (gate: +50 PSMs) - S3: additive PrecursorErrorPpmSquared PIN column (gate: +50 PSMs on any one dataset) Each sub-feature ships only if its bench gate passes; failures get dropped before merge.

Profile (perf record on Astral cal=off, 285K samples) identified 36.82% of CPU in HashMap<Partition, ...> lookups using SipHash13. The hot scoring path (compute_inner, directional_node_score_inner, rank_scorer::error_score) repeatedly looks up Partition keys for rank_dist_table, frag_off_table, ion_existence_table, etc. Switch the 7 Param hot tables to FxHashMap (rustc-hash). SipHash13's state-init + 13-round mix is unnecessary for non-cryptographic keys on a single-process search; FxHasher is a single multiply-and-xor. Same .get/.insert API; only iteration order differs (and no hot path iterates these tables). Expected: ~25-30% reduction in match_spectra wall on Astral cal=off. Bench gate (S1) requires >= 5% Astral wall reduction to ship.

@1

If the strict SpecEValue threshold (1e-6) does not qualify MIN_CONFIDENT_PSMS (200) in the cal pre-pass, retry once at 1e-5 before giving up. Preserves Java parity on datasets where 1e-6 already succeeds (Astral, TMT); recovers LFQ-shaped distributions where Rust's SpecE-tail drift leaves the cal pre-pass a few PSMs short (LFQ ships at 193/200 in PR-A). Median-of-residuals + MAD-based robust sigma are robust to one decade of noisier outliers; the fallback is a one-shot retry, not a baseline threshold change. Bench gate (S2): LFQ auto @1% FDR >= 14,805 (baseline 14,755 + 50).

This reverts commit 67002e4.

qodo-code-review · 2026-05-26T18:06:07Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

coderabbitai · 2026-05-26T18:06:10Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1327cab1-24a0-4ef3-9379-9c54f4843342

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/quality-perf-id-rate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Target 1 (roundf elimination): the perf flamegraph (Astral cal=off, post-PR-V1 FxHashMap) shows 5.15% CPU in libc roundf called from f32::round() / f64::round() in scoring hot paths. Replace with the branchless `(x + 0.5.copysign(x)) as i32` idiom. Mathematically equivalent for finite non-edge values: - x > 0: (x + 0.5) truncated == round_half_away_from_zero - x < 0: (x - 0.5) truncated == round_half_away_from_zero - x = 0: 0 + 0 = 0 in both forms Skipped: match_engine.rs:256,257,679,680 (`(tol - 0.4999).round()`) which is a window-widening adjustment with a different semantic, not the round-to-nearest idiom. Target 2 (GF DP inner-loop): compute_inner is already tightly written; added a comment documenting why no structural change was made and noting the next opportunities (prev_idx caching alongside valid_edges, SIMD widening of the inner multiply loop). No functional behavior change; bit-identical PIN regression gate green.

… tightening" This reverts commit 9a6607a.

Re-profile on the PR-V1 binary (Astral cal=off, 245K samples) showed the FxHashMap swap in scoring::Param did NOT close the dominant hashing hotspot. 39.35% of CPU was still in the FnMut::call_mut chain, which traces to: expand_mod_combinations -> AminoAcidSet::variants_for -> HashMap<(u8, ModLocation), Vec<AminoAcid>>::get -> hashbrown find_inner with std::hash::random::RandomState (SipHash13) PR-V1 missed this map because it lives in the `model` crate, not `scoring`. The candidate enumeration calls variants_for once per peptide position per candidate; on Astral this dominates wall. This commit: - Adds rustc-hash as a model-crate dependency. - Switches AminoAcidSet::table and aa_lists_cache from HashMap to FxHashMap. Same hashbrown internals; FxHasher is a multiply-and-xor vs SipHash13's 13-round mix. No functional behavior change; PIN bit-identical regression gate green. Tests pass. Expected wall reduction on Astral cal=off: -10..-25%, depending on how much of the 39% chain comes from the SipHash vs the surrounding Arc clones and Vec extends.

…enum A perf trace on the PR-V1 binary (Astral cal=off, 245K samples) showed 12.63%+ of CPU under `to_vec<AminoAcid>` and `Arc<Modification>::clone` within the `expand_mod_combinations` -> `variants_for` chain. The candidate enumerator was cloning the full Anywhere-variants Vec for every interior position of every candidate peptide. For a length-L peptide, L-2 of L positions had no terminal merging to do (typically ~87% of positions on real peptides) and the clone was pure waste. Refactor: only the 1-2 terminal positions (pos 0 and pos L-1) build owned merged variants via a new `build_terminal_variants` helper. Interior positions reference the AminoAcidSet's Anywhere-variants slice directly. `expand_recursive`'s signature changes from `&[Vec<AminoAcid>]` to `&[&[AminoAcid]]`. No functional behavior change; bit-identical regression gate green. Expected wall reduction on Astral cal=off: -5..-15%, depending on how much of the 12.63% chain is the `to_vec` vs the surrounding recursion overhead.

ypriverol · 2026-05-27T10:55:34Z

iter2 + iter3 added on top of S1

Two more profile-driven commits stacked on the PR. Both are bit-identical to S1's output (sorted-row PIN match on all 6 dataset/mode cells); they only change how the work runs.

Commits

Commit	What
`319af81e`	iter2 — `HashMap` -> `FxHashMap` on `AminoAcidSet` hot tables (`table`, `aa_lists_cache`). Found via re-profile after S1: PR-V1 swapped the scoring crate's maps but missed this one in `model::aa_set`.
`096dbca9`	iter3 — eliminate per-interior-position `Vec<AminoAcid>` clone in candidate enumeration. `expand_mod_combinations` was cloning the same `AminoAcidSet::Anywhere` slice once per interior position (~87% of L positions). New `build_terminal_variants` helper builds only the 2 terminal Vecs; interior positions borrow the underlying slice via `&[AminoAcid]`. Biggest gain on TMT (densest mod-variant counts).

Bench (iter3 HEAD vs S1b baseline, both runs on the same VM)

Dataset	Mode	S1b wall	iter3 wall	Δ
LFQ (PXD001819)	off	0:46.78	0:44.04	-5.9%
LFQ	auto	1:01.08	0:57.92	-5.2%
Astral	off	5:32.53	5:44.80	+3.7% (Java load +1.3% noisier; net ~+2.4%)
Astral	auto	6:19.09	6:16.80	-0.6%
TMT	off	2:31.29	2:15.88	-10.2%
TMT	auto	2:50.50	2:35.17	-9.0%

Cumulative dev → iter3 HEAD wall

Dataset	Mode	dev	iter3	Δ
LFQ	off	0:46	0:44	-4%
LFQ	auto	1:00	0:58	-3%
Astral	off	6:12	5:45	-7.3%
Astral	auto	6:54	6:17	-8.9%
TMT	off	2:59	2:16	-24.0%
TMT	auto	3:21	2:35	-22.9%

PSMs

All 6 dataset/mode cells produce PIN files that are sorted-row identical to S1b. PSM counts are inherited unchanged: Astral 36,138/36,715 · LFQ 14,755/14,755 · TMT 9,364/9,605.

Hot paths cross crate boundaries (search -> scoring -> model). Default release profile uses codegen-units=16 with no LTO, so LLVM cannot inline across crates and small leaf functions (AminoAcid::nominal_mass, Enzyme::is_cleavable, FxHasher::write_u32, etc.) stay as function calls across the hot search loop. LTO=fat + cgu=1 give LLVM a whole-binary view at link time so it can inline cross-crate, deduplicate similar generic instantiations, and pick better register allocations on hot paths. Build time goes up (~2-3x) and binary size grows slightly, but this is a release-time cost only. No `target-cpu=native` here -- the released binary must run on a baseline x86_64 (e.g. older bench VMs). Future bench-only builds can opt in with `RUSTFLAGS="-C target-cpu=native"`. The bit-identical regression gate (precursor_cal_off_pin_tsv_match_golden_after_sort) is green under the new profile -- LLVM's whole-program view does not change float semantics.

ypriverol · 2026-05-27T12:49:30Z

iter4 — LTO=fat + codegen-units=1 (build profile, no code change)

Pure release-profile change in Cargo.toml. The previous release build used cargo defaults (codegen-units=16, no LTO), so cross-crate inlining was disabled. Now LLVM gets a whole-binary view at link time.

Verification

Bit-identical to S1b on all 6 dataset/mode cells (sorted-row PIN match) → Percolator @1% PSM counts unchanged
Binary size: 2.3M → 1.9M (-17%)
Build time: 12s incremental → 33s clean LTO link

Bench (iter4 vs iter3, same VM, Java drift ≤ 1.2%)

Dataset	Mode	iter3	iter4	Δ
Astral	java	5:41.22	5:45.02	+1.1% (load)
Astral	rust-off	5:44.80	6:00.11	+4.4% ⚠️
Astral	rust-auto	6:16.80	5:58.14	-5.0%
PXD001819	java	1:23.44	1:22.84	-0.7%
PXD001819	rust-off	0:44.04	0:43.82	flat
PXD001819	rust-auto	0:57.92	0:53.68	-7.3%
TMT	java	2:58.16	2:56.75	-0.8%
TMT	rust-off	2:15.88	2:16.39	flat
TMT	rust-auto	2:35.17	2:31.45	-2.4%

Read

Mixed. Slight wins on cal=auto modes (-2% to -7%), Astral cal=off regressed +3% beyond Java's load drift. Build-config changes shouldn't slow steady-state code; this is most likely noise on the longer Astral run, but the lack of a clean win means LTO isn't the next-step lever for the wall-time target.

Keeping iter4 because:

Binary is 17% smaller (unambiguous ship win)
Production-quality release settings — industry standard
Zero correctness risk (bit-identical PIN gate green)

Next: re-profile suggests GF DP compute_inner at 16% of leaf time is the largest single target. Lower-priority leaves: directional_node_score_inner 9%, roundf libm calls 6%, PrimitiveAaGraph::build_in_place 5%. The 4-wide chunked inner loop in compute_inner is already vectorization-friendly — target-cpu=native (AVX2/FMA) would unlock it but risks bit-identity to S1b's golden PINs.

ypriverol added 13 commits May 26, 2026 09:04

Revert "feat(s2): MassCalibrator threshold fallback 1e-6 -> 1e-5"

09824bd

This reverts commit 67002e4.

ypriverol added 4 commits May 27, 2026 08:59

Revert "perf(scoring): branchless f32/f64 rounding + GF DP inner-loop…

d6a869d

… tightening" This reverts commit 9a6607a.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf+chore: FxHashMap on hot scoring tables (-10% Astral, -15% TMT) + post-cutover cleanup#36

perf+chore: FxHashMap on hot scoring tables (-10% Astral, -15% TMT) + post-cutover cleanup#36
ypriverol wants to merge 18 commits into
devfrom
feat/quality-perf-id-rate

ypriverol commented May 26, 2026

Uh oh!

qodo-code-review Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review skipped

Uh oh!

ypriverol commented May 27, 2026

Uh oh!

ypriverol commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ypriverol commented May 26, 2026

Summary

Bench results (Percolator, controlled VM load 1m=0.70)

Wall time

PSMs @1% FDR

Commits

Dropped sub-features

Why this PR ships

Out of scope (next PR)

Verification

Uh oh!

qodo-code-review Bot commented May 26, 2026

Qodo reviews are paused for this user.

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ypriverol commented May 27, 2026

iter2 + iter3 added on top of S1

Commits

Bench (iter3 HEAD vs S1b baseline, both runs on the same VM)

Cumulative dev → iter3 HEAD wall

PSMs

Uh oh!

ypriverol commented May 27, 2026

iter4 — LTO=fat + codegen-units=1 (build profile, no code change)

Verification

Bench (iter4 vs iter3, same VM, Java drift ≤ 1.2%)

Read

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 26, 2026 •

edited

Loading